Decision Tree | Tree Ensemble | XGBoost - utkaln/machine-learning GitHub Wiki

Decision Tree Basics

Classify based on features that intermittently cluster similar items together, eventually driving to final classification
Select the feature that can quickly Narrow down to decisive results

decision-tree v1

Decision 1 : Explore Features

A good feature is the one that helps arrive at pure form quicker

Decision 2: When to stop splitting

When further splitting does not improve beyond minimum threshold

Concept of Entropy

Measurement of impurity
p0 and p1 are the most pure. p0.5 is the most impure
Equation : H(p1) = -p1 * log2(p1) - p0 * log2(p0)

entropy

How to choose which split to go to

This is driven by a concept called Information Gain which is calculated using the concepts of Entropy
At a Decision Node, a weighted average of probability of left node and right node is calculated. The decision tree that shows the biggest difference from the probability at the root is the most preferred split to go to

Illustration below:

decision-tree-InformationGain

Summary of Decision Tree Flow

Start with all examples at Root Node
Calculate Information Gain on all possible features and choose the one with highest gain
Split data according to the selected feature
Keep Repeating Splitting until -

When a node has 100% of the class
When Information Gain less than Threshold value
When Splitting a result will Exceed Maximum Depth of Tree Decided
When Number of Examples in a node is Below Threshold

Tree Ensemble

Multiple independent decision trees are used to make a decision based on majority
This is achieved by creating sample with replacement
The optimal number of rounds of creating sample is about 100. Larger sample usually does not provide any significant higher accuracy

XGBoost

Full form: eXtreme Gradient Boosting
Algorithm that works with Tree Ensemble by providing more focus on the misclassified predictions from previous round
It reduces error using the following mechanism - In contrast to Random Forest which creates unrelated decision trees, XGBoost creates trees fitting one after the other to minimize the error

Decision Tree Vs. Neural Network

Decision Tree	Neural Network
Works well on Structured Data	Works well on Structured and Unstructured Data
Recommended for Tabular Data	Recommended for Speech, Text, Video type Data
Faster Processing	Slower Processing
Human Interpretable	Not Easy for Humans to Interpret
Can't leverage Transfer Learning	Transfer Learning can help improve accuracy
Mostly works as one model for one system	Multiple Networks can be strung together in a System to build with multiple models

Example Code